Systems and methods for consumer-generated media reputation management

ABSTRACT

A computing system configured so gather social media content includes a memory; a content collection and ingestion system, stored in the memory and configured, when executed on a computer processor, to communicate with, one or more computing systems to direct a search of a content source using a received collection request and to ingest the insults of the directed search into a data store; and a content management system, stored in the memory and configured, when executed on a computer processor, to display the ingested results on a display,

This application is a Continuation of U.S. patent application Ser. No.13/230,825 filed Sep. 12, 2011, which application claims priority to andthe benefit of U.S. Provisional Application Ser. No. 61/381,783 filedSep. 10, 2010, both of which are hereby incorporated by reference intheir entirety as if fully set forth herein.

COPYRIGHT NOTICE

This disclosure is protected under United States and InternationalCopyright Laws.© 2006-2016 Visible Technologies LLC. All RightsReserved. A portion of the disclosure of the patent document containsmaterial that is subject to copyright protection. The copyright ownerhas no objection to the facsimile reproduction by any one of the patentdocument or the patent disclosure after formal publication by the U.S.Patent Office, as it appears in tire Patent and Trademark Office patentfile or records, but otherwise reserves all copyrights whatsoever.

As used herein, the term “Consumer Generated Media” (hereinafter CGM) isa phrase that describes a wide variety of Internet web pages or sites,which are sometimes individually labeled as web logs or “blogs”, mobilephone blogs or “moblogs”, video hosting blogs or “vlogs” or “vblogs”,forums, electronic discussion messages, Usenet, message boards, BBSemulating services, product review and discussion web sites, onlineretail sites that support customer augments, social, networks, mediarepositories, audio and video sharing sites/networks and digitallibraries. Private non-internet information systems can host CGM contentas well, via environments like Sharepoint, Wiki, Jira, CRM systems, ERFsystems, and advertising systems. Other acronyms that describe thisspace are CCC (consumer created content), WSM (weblogs and socialmedia), WOMM (Word of Mouth Media) or OWOM, (online word of mouth), andmany others.

As used herein, the term “Keyphrase” refers to a word, string of words,or groups of words with Boolean modifiers that are used as models fordiscovering CGM content that might be relevant to a given topic. Couldalso be an example image, audio file or video file that hascharacteristics that would be used for content discovery and matching.

As used herein, the term “Post” refers to a single piece of CGM content.This might be a literal weblog posting, a comment, a forum reply, sproduct review, or any other single element of CGM content

As used herein, the term “Site” refers to an Internet site whichcontains CGM content.

As used herein, fee term “Blog” refex to an Internet sits which containsCGM content

As used herein, the term “Content” refers to media that resides on CGMsites. CGM is often text, but includes audio files and streams(podcasts, mp3, streamcasts, Internet radio, etc.) video files andstreams, animations (flash, Java) and other forms of multimedia.

As used herein, the term “UI” refers to a User Interface that usersinternet with computer software, perform work, and review results.

As used herein, the term “IM” refers to an instant Messenger, which is aclass of software applications that allow direct text basedcommunication between known peers.

As used herein, the term “Thread” refers to an “original” post and allof the comments connected to it, present on a blog or forum. Adiscussion thread holds the information of content display order, sothis message came first, followed by this, followed by this.

As used herein, the term “Permalink” refers to a URL which persistentlypoints to an individual CGM thread

The Internet and other computer networks are communication systems. Thesophistication of this communication has improved and the primary modesdifferentiated over time and technological progress. Each primary modeof online communication varies based, on a combination of three basicvalues: privacy and persistence and control. Email as a communicationsmedium is private (communications are initially exchanged only betweennamed recipients), persistent (saved in inboxes or mail servers) butlacks control (once you send the message, you can't take it buck, oredit it, or limit re-use of it). Instant messaging is -private,typically not persistent (some newer clients are now allowing users tosave history, so this mode is changing) and lacks control. Messageboards are public (typically all members, and often all Internet users.,can access your message) persistent, bin lack control (they aretypically moderated by a central owner of the board). Chat rooms arepublic (again, some are membership based) typically not persistent, andlack control.

privacy persistence author control Chat Rooms/IRC no no no InstantMessaging yes no no Forums no yes no Email yes yes no Blogs no yes yessocial networks yes/no yes yes Second Life yes yes yes+

Blogs and Social Networks are the predominant communications mediumsthat permit author control. By reducing the cost, technicalsophistication, and experience required to create and administer a website, blogs and other persistent, online communication have given anunprecedented amount of editorial control to millions of online authors.This has created a unique new environment for creative expression,commentary, discourse, and criticism without the historical limits ofeditorial control, cost, technical expertise, or distribution/exposure.

There is significant value is the information contained within thispublic media. Because the opinions, topics of discussion, brands andcelebrities mentioned and relationships evinced are typically totallyunsolicited, the information presented, if well studied, represents anamazing new source of social insight, consumer feedback, opinionmeasurement, popularity analysis and messaging data. It also representsa fully exposed, granular network of peer and hierarchical relationshipsrich with authority and influence. The marketing, advertising, and. PRvalue of this information is unprecedented.

This new medium represents a significant challenge for interestedparties to comprehensively understand and interact with. As of Q1 2007estimates for the number of active, unique online CGM sites (forums,blogs, social networks, etc.) range from 50 to 71 million, with growthrates in the hundreds of thousands of new sites per day. Compared to thetypical mediums that PR, Advertising and Marketing businesses anddivisions interact with (<1000 TV channels, <1000 radio stations, <1000major news publications, <10-20 major pundits on any given subject,etc.) this represents a nearly 10,000-told increase m the number ofpotential targets for interaction.

Businesses and other motivated communicators have come to depend onsoftware that perform Business Intelligence, Customer RelationshipManagement, and Enterprise Resource Planning tasks to facilitateaccelerated, organised, prioritised, tracked and analyzed interactionwith customers and other target groups (voters, consumers, pundits,opinion leaders, analysts, reporters, etc.). These systems have beenextended to facilitate IM, E-mail, and telephone interactions. Thesemedia have been successfully integrated because of standards (jabber,pop3, smtp, pots, imap) that require that all participant applicationsconform to a set data format that allows interaction with this data in apredictable way.

Blogs and other CGM generate business value for their owners, both onprivate sites that use custom or open source software to manage theircommunications, and for massive public hosts. Because these sites cangenerate advertising revenue, there is a drive by author/owners toprotect fee content on these sites, so readers/subscribers/peers have tovisit the site, and become exposed to revenue generating advertising, inorder to participate in/observe the communication. Because of thisfinancial disincentive, there is no unifying standard for blogs whichcontains complete data RSS and Atom feeds allow structured communicationof some portion of the communication on sites, but are often veryincomplete representations of the data available on a given site. Sitesalso protect their content from being “stolen” by automated systems withan array of CAPTCHAs, (“Completely Automated Public Turing test to tellComputers and Humans Apart”) email verification, mobile phone textmessage verification, password authentication, cookie tracking, UniformResource Locator (URL) obfuscation, timeouts and Internet Protocol (IP)address tracking.

The result is a massively diverse community that it would be veryvaluable to understand and interact with, which resists aggregation andunified interaction by way of significant technical diversity,resistance to complete information data standards, and tests thatattempt to respire one-to-one human interaction with contest.

BRIEF DESCRIPTION OF THE DRAWINGS

The preferred and alternative embodiments of the present invention aredescribed in detail below with reference to the following drawings.

FIGS. 1A-1B shows an example system for consumer generated mediareputation management; and,

FIG. 2 shows a method for consumer generated media reputationmanagement;

FIG. 3 shows a incoming data from collection being delivered to aningestion system in one embodiment;

FIG. 4 is a depiction of one embodiment of a CGM site discovery system;

FIG. 5 provides an overview of ingestion in one embodiment;

FIG. 6 shows manual scoring in one embodiment;

FIGS. 7-9 show the smooth transition between user scoring and automatedscoring, in one embodiment;

FIG. 10 is a depiction of one embodiment of a CGM response engine;

FIGS. 11-13 show screen shots of a registration and response feature;

FIG. 14 shows an example sereenshot of the TruCast Login Authenticationscreen;

FIG. 15 shows an example screenshot of a user interface homepage;

FIG. 16 shows an example screenshot of an account manager panel;

FIG. 17 shows an example screenshot of a user manager panel;

FIG. 18 shows an example screenshot of a topic manager panel;

FIG. 19 shows an example screenshot of a topic manager panel with thekeyphrase tab activated;

FIG. 20 shows an example screenshot of a sorting manager;

FIG. 21 shows an example screenshot of the sorting manager with the usertab activated;

FIG. 22 stows an example screenshot of a scoring manager;

FIG. 23 shows an example screenshot of a scoring manager with a newtopic creator screenshot activated;

FIG. 24 shows an example screenshot of a response manager;

FIG. 25 shows an example screenshot of an administrative queue;

FIG. 26 shows an example screenshot of a dashboard launcher;

FIG. 27 shows an example screenshot of an impact dashboard;

FIG. 28 shows an example screenshot of a sentiment dashboard;

FIG. 29 shows an example screen shot of a sentiment history dashboard;

FIG. 30 shows an example screenshot of an authority map dashboard;

FIG. 31 shows an example screenshot of a data drilldown dashboard;

FIG. 32 shows an example screenshot of an ecosystem map dashboard;

FIG. 33 shows an example screenshot of an ecosystem map zoom out view;

FIG. 34 shows an example screenshot of a sentiment summary;

FIG. 35 shows an example screenshot of a set of top lists;

FIG. 36 shows an example screenshot of reporting;

FIG. 37 shows as example screenshot of an aggregate performancedashboard;

FIG. 38 shows a system overview in detail;

FIG. 39 shows a graphical user interface generated in accordance with anembodiment;

FIG. 40 shows an exemplary screenshot of a graphical user interfaceaccording to an embodiment;

FIG. 41 shows an exemplary screenshot of a graphical user interfaceaccording to an embodiment;

FIG. 42 shows an exemplary screenshot of a graphical user interfaceaccording to an embodiment;

FIG. 43 shows art exemplary screenshot of a graphical user interfaceaccording to an embodiment;

FIG. 44 shows an exemplary screenshot of a graphical user Interfaceaccording to an embodiment;

FIG. 45 illustrates an invoked email message window according to anembodiment;

FIG. 46 shows an exemplary screenshot of a graphical user interfaceaccording to an embodiment;

FIG. 45 illustrates a labeled, message according to m embodiment;

FIG. 48 shows an exemplary screenshot of a graphical user interfaceaccording to an embodiment;

FIG. 49 shows an exemplary screenshot of a graphical user interfaceaccording to an embodiment and

FIG. 50 shows an exemplary screenshot of a dashboard according to anembodiment of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Embodiments described herein provide enhanced computer- andnetwork-based methods, techniques, and systems for maintaining socialintelligence of the social media space. Exemplary embodiments provide aSocial intelligence System (“SIS”), which enables users to search,monitor, export, compare, discover, engage, and or manage social media.In one embodiment the SIS includes a content collection and ingestionsystem, stored in the memory and configured, when executed on a computerprocessor, to communicate with one or more computing systems to direct asearch of a content source-using a received collection request and toingest the results of the directed, search into a data store; and acontent management system, stored in the memory and configured, whenexecuted on a computer processor, to display the ingested results on adisplay. One such example of sub techniques is described in U.S. Pat.No. 7,720,835, tiled May 7, 2007, and entitled “SYSTEMS AND METHODS FOR.CONSUMER-GENERATED MEDIA REPUTATION MANAGEMENT,” which is incorporated,herein by reference in its entirety.

The following sections describe various architectural components, dataflows, and other aspects of an example embodiment of a SIS, includingvarious processes used to implement an example SIS. It further describesvarious alternative or additional techniques that may be employed byvarious embodiments of an SIS. Other system organizations and processflows could also be used to implement the capabilities of an SIS.

Embodiments include platform and services designed to enable Global 2000customers, agencies and integration partners to more profitably engagetheir markets. Embodiments: transform masses of unstructured socialmedia data into actionable business insights that drive purchasebehavior, improved customer service and brand loyalty.

In on embodiment, social data is collected from blogs, microblogs,social networks, social news sites, message boards & forums, socialvideo sites. This includes the social most-have sites: Twitter,Facebook, YouTube and LinkedIn. Details include, but are not limited to,the following examples:

Twitter—Twitter data may be according to the specific search termsentered into the saved searches.

Facebook—Discussion areas of Facebook.

Blog—acccess and crawl well over 200 million English-language blogs and1.3 million foreign language blogs.

Microblogs

Social Network—discussion and blog areas of Facebook, MySpace. LinkedIn,and other smaller sites.

Forums—nearly 200 k unique discussion boards and the millions of forumsthey contain.

Video/Photo—YouTube data may be collected directly vis YouTube API.

News and MSM (mainstream media).

Bookmarking/Sharing.

Reviews/Shopping—Historical data—dating back to 2005 for blog and forumcontent.

Content in over 50 languages.

Full post collection—collection of the fall thread, or search matchedposts on a thread.

Social Video—using Google's video search engine with well over 20 sitescovered including all YouTube content (truncated results, pinging likePulse).

Global News feed—over 35,000 news sources, all languages, includeshistorical, providing essentially “all” public news content(Moreover/Effyis).

Facebook API Integration—connect to at least one, but preferably four,key Facebook Graph API's to collect wall posts/comments on all publicpages.

Global Hotsites—custom collection with for proprietary crawler inpriority languages to improve global coverage.

The capability exists for an embodiment to add additional RSS datasources that may be keyword matched.

Data may be collected in languages available for that media type. Forexample, Twitter is available in English, Japanese, French, Italian,German and Spanish, which is what an embodiment, in turn, is able tocollect.

Threaded content is available and includes the original post and all thecomments written against that post.

Hyperlinks (aka permalinks) to a video or picture are preferablypresented with the post for easy one-click view to the source site.

An embodiment regularly searches the Internet 24×7×365 for keywords tomatch the saved searches created for each of your workspaces, and alsodoes full-site collection from popular areas of the social that arehotbeds of social conversation. An embodiment also builds customizedtemplates for sites (aka “hot sites”) that are of relevance to you,which may come at additional costs to implement.

An embodiment has the ability to collect from password-protected siteson a case-by-case basis provided (1) the client gives permission toaccess them with their user name and password (e.g., LinkedIn), and (2)a special template is constructed for that site and user to implementcollection.

Data collection Is as ongoing process that takes place 24×7×365. Mostcontent is identified and collected within minutes or hours of appearingpublicly on the internet. Once the data is in a system according to anembodiment, it is cleaned and enriched, as discussed in greater detailbelow, and is available to users anywhere from 30 minutes to 5 hours.

All content that an embodiment collects from different media sourcesgoes through a common ingestion and QA process that includes: datanormalization, de-duplication, SPAM filtering and auto-scoring,

Filtering options for any search include:

Advanced Booklean Logic, which may include:

Keyword and phrase include/exclude

No limit to size of exact phrases

“AND”/“OR”

Grouping

“NEAR” (user specified distance from keyword)

Wildcard searches (single and multiple wild character searches)

Fuzzy searches

Special characters

Domain

Post title

Author name

Media type

Author—Include/Exclude. Multiple values can be entered separated bycommas.

Site—Include/Exclude. Multiple values can fee entered separated, bycommas.

Time

Sentiment

Data can be segmented by three principle subjects—author, site andcontent—and further by sentiment, volume, lists, trend, media type, geoand period. Data can be segmented further with keyword includes andexcludes using filters.

In an embodiment, a word cloud identifies the most common word and termsoccurring within filtered search results. The cloud is interactive, andas a riser clicks on a word or phrase, the search is (temporarily)revised to include that word or term and narrow the search results. Thiscan be a very useful method to quickly determine growing topic trendswithin a brand or issue before they become viral and help keep theuser's finger on the pulse of what people are talking about in specifictime periods.

Influencers can be identified using a number of methods in the toolincluding the most active authors, authors on top websites.

Analytics provide the ability to identify trends, drill into the driversbehind shifts in the trends and tag appropriate insights to be handledas required by a user's social media team, product innovation team,brand managers and marketing specialists.

Auto-sentiment (aka automated sentiment) is the programmatic review andanalysis of status updates, tweets and blog posts through NaturalLanguage Processing and assignment of positive, negative, neutral ormixed sentiment.

An embodiment utilises proprietary algorithms, rich Natural LanguageProcessing and machine learning to provide industry leading sentimentscoring for even-piece of content collected. All data may heautomatically scored for sentiment (some competitors only score asample) for positive, negative, neutral (gray circle) and mixed (yellowface).

Searches may include the ability to search based on multi-word phrases(in quotations) and to use compound search operators and wildcards(proximity like “NEAR” may be included) to find the data you are lookingfor.

Geographical source of data may be identified at the country-level usingIP identification (of the hosted domain/URL) and manual site validation,for the highest volume sites.

Data may be collected from both localized and global websites. Anembodiment collects some specific geographical data at the latitude andlongitude level.

In an embodiment, custom tagging provides the ability to define astrategy that works best for a user. These tags can then be added to anypost providing a basis for workflow as tagged posts are filtered andsurfaced for users according to the specific tag they are looking for.Any post can also be forwarded, as an email from within the applicationto users that are outside the system, providing exposure to the specificdata and enabling action.

Users may engage with Twitter directly through the tool and promote anypost to Facebook as well.

Users may promote by clicking the drop down while on a post and thenlogging m to their selected site, or if logged in, an embodiment copiesthe body of the post for them with the appropriate ®mention or RTnotation to help a user save time.

For example, directly from a displayed post, a user may:

Retweet to Twitter

Reply to Twitter

Promote to Facebook

Promote to Digg

Promote to Delicious

Promote to MySpace

Additionally, a user can select to email any post they are reviewing.The post body auto tills in their email tool and allows them to enterthe “To” address.

In an embodiment, the Monitoring Tab is where a user may view savedsearches and drill into the data sets, track changes over time anduncover potential insights and areas for deeper research. There is atleast one but preferably at least 27 distinct views that allow a user todrill down, from high-level metrics to granular post-level view details.A user can slice and dice data based on multiple pivots:

By Focus areas

Content—what people are talking about

Sites—the places where people are having conversations

Authors—who the people are that are having conversations

By View areas

List

Volume (further drillable with additional pivots like Trend, Geo,Period, Time, Media Type)

Sentiment (further drillable with additional pivots like Trend, Geo,Period, Time, Media Type)

Dashboards can provide the ability to select the Search and Monitorviews that are most relevant, to the user and combine those views andsearches on a single screen view.

When selecting and clicking on a data point, fee drill down may show thecontent (posts) that drove the specific data point value for therespective dove frame. So a user may see all of the posts related tothat point, for the selected time period.

An embodiment includes an event detection feature called IntelligentAlerts, that may track volume and tone changes that are out of the“normal” range as defined. It is highly useful for scenarios likemonitoring a potential crisis outbreak, forecasting trends and campaignoutcomes, etc.

According to one or more embodiments, the combination of software orcomputer-executable instructions with a computer-readable medium resultsin the creation of a machine or apparatus. Similarly, the execution ofsoftware or computer-executable instructions by a processing deviceresults in the creation of a machine or apparatus, which may hedistinguishable from the processing device, itself according to anembodiment.

Correspondingly, it is to be understood that a computer-readable mediumis transformed by storing software or computer-executable instructionsthereon. Likewise, a processing device is transformed, in the course ofexecuting software or computer-executable instructions. Additionally, itis to be understood that a first set of data input to a processingdevice during, or otherwise In association with, the execution ofsoftware or computer-executable instructions by the processing device istransformed into a second set of data as a consequence of such,execution. This second data set may subsequently be stored, displayed,or otherwise communicated Such transformation, alluded to in each of theabove examples, may be a consequence of or otherwise involve, thephysical alteration of portions of a computer-readable medium. Suchtransformation, alluded to in each of the above examples, may also be aconsequence of or otherwise involve, the physical alteration of, forexample, the states of registers and/or counters associated with aprocessing device during execution of software or compute-executableinstructions by the processing device.

As used herein, a process that is performed “automatically” may meanthat the process is performed as a result of machine-executedinstructions and does not, other than the establishment of userpreferences, require manual effort.

FIG. 1A illustrates an example of a suitable computing systemenvironment 100 on which an embodiment of the invention may beimplemented. The computing system environment 100 is only one example ofa suitable computing environment and is not intended to suggest anylimitation as to the scope of use or functionality of embodiments of theinvention. Neither should the computing environment 100 be interpretedas having any dependency or requirement relating to any one orcombination of components illustrated in the exemplary operatingenvironment 100.

Embodiments of the invention are operational with numerous othergeneral-purpose or special-purpose computing-system environments orconfigurations. Examples of well-known computing systems, environments,and/or configurations that may be suitable for use with embodiments ofthe invention include, but are not limited to, personal computers,server computers, hand-held or laptop devices, multiprocessor systems,microprocessor-based systems, set-top boxes, programmable consumerelectronics, network PCs, minicomputers, mainframe computers,distributed-computing environments that include any of the above systemsor devices, and the like.

Embodiments of the invention may be described in the general context ofcomputer-executable instructions, such as program modules, beingexecuted by a computer. Generally, program modules include routines,programs, objects, components, data structures, etc, that performparticular tasks or implement-particular abstract data types.Embodiments of the invention may also he practiced indistributed-computing environments where tasks are performed by remoteprocessing devices that are linked through a communications network, ina distributed-computing environment, program modules may be located mboth local- and remote-computer storage media including memory storagedevices.

With reference to FIG. 1A, an exemplary system for implementing anembodiment of the invention includes a computing device, such ascomputing device 100. In its most basic configuration, computing device100 typically includes at least one processing unit 102 and memory 104.

Depending on the exact configuration and type of computing device,memory 104 may be volatile (such, as random-access memory (RAM)),non-volatile (such as read-only memory (ROM), flash memory, etc.) orsome combination of the two. This most basic configuration isillustrated in FIG. 1A by dashed, line 106.

Additionally device 100 may have additional features/functionality. Forexample, device 100 may also include additional storage (removableand/or non-removable) including, but not limited to, magnetic or opticaldisks or tape. Such additional storage is illustrated in FIG. 1A byremovable storage 108 and non-removable storage 110. Computer storagemedia includes volatile and nonvolatile, removable and non-removablemedia implemented in any method, or technology for storage ofinformation such as computer-readable instructions, data structures,program modules or other data. Memory 104, removable storage 108 andnon-removable storage 110 are all examples of computer storage media.Computer storage media Includes, but is not limited to, RAM, ROM,EEPROM, flash memory or other memory technology, CD-ROM, digitalversatile disks (DVD) or other optical storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or any other medium which can be used to store the desired informationand which can be accessed by device 100. Any such computer storage mediamay be part of device 100.

Device 100 may also contain communications connection(s) 112 that allowthe device to communicate with other devices. Communicationsconnection(s) 112 is an example of communication media. Communicationmedia typically embodies computer-readable instructions, data,structures, program modules or other data in a modulated data signalsuch as a carrier wave or other transport mechanism and includes anyinformation delivery media. The term “modulated data signal” means asignal that has one or more of its characteristics set or changed, insuch a manner as to encode information in the signal. By way of example,and not limitation, communication media includes wired media such as awired network or direct-wired connection, and wireless media such asacoustic, radio-frequency (RF), infrared and other wireless media. Theterm computer-readable media as used herein includes both storage mediaand communication media.

Device 100 may also have input device(s) 114 such as keyboard, mouse,pen, voice-input device, touch-input device, etc. Output device(s) 116such as a display, speakers, printer, etc. may also be included. Allsuch devices are well-known in the art and need not be discussed atlength here.

Referring now to FIG. 1B, an embodiment of the present invention can bedescribed in the contest of as exemplary computer network system 200 asillustrated. System 200 includes an electronic client, device 210, suchas a personal computer or workstation, that is linked via acommunication medium, such as a network 220 (e.g., the internet), to anelectronic device or system, such as a server 230. The server 230 mayfurther he coupled, or otherwise have access, to a database 240 and acomputer system 260. Although the embodiment illustrated in FIG. 1Bincludes one server 230 coupled to one client device 210 via the network220, it should he recognized that embodiments of the invention may beimplemented using one or more such client devices coupled to one or moresuch servers.

In an embodiment, each of the client device 210 and server 230 mayinclude all or fewer than all of the features associated with the device100 illustrated in and discussed with reference to FIG. 1A. Clientdevice 210 includes or is otherwise coupled to a computer screen ordisplay 250. As is well known in the art, client device 210 can be usedfor various purposes including both network- and local-computingprocesses.

The client device 210 is linked via the network 220 to server 230 sothat computer programs, such as, for example, a browser, running on theclient device 210 can cooperate in two-way communication with server230. Server 230 may be coupled to database 240 to retrieve informationtherefrom and to store information thereto. Database 240 may include aplurality of different tables (sot shown) that can be used by server 230to enable performance of various aspects of embodiments of theinvention. Additionally, the server 230 may he coupled to the computersystem 260 in a manner allowing the server to delegate certainprocessing functions to the computer system.

In one embodiment, the methods: and systems are implemented, by acoordinated software and hardware computer system. This system iscomprised of a set of dedicated networked servers controlled by TruCast.The servers are installed with a combination of commercially availablesoftware, custom configurations, and custom software. A web server isone of those modules, which exposes a web based client-side UI tocustomer web browsers. The UI interacts with the dedicated servers todeliver information to users. The cumulative logical function of thesesystems results in a system and method referred to as TruCast.

In alternate embodiments, the servers could be placed, client side,could be shared or publicly owned, could be located together orseparately. The servers could be the aggregation of non-dedicatedcompute resources from a Peer to Peer (P2P), grid, or other distributednetwork computing environments. The servers could, ran differentcommercial applications, different configurations with the same orsimilar cumulative logical function. The client to this system could herun directly from, the server, could he a client aide executable, couldreside on a mobile phone or mobile media device, could be a plug-in toother Line of Business applications or management systems. This systemcould operate in a client-less mode where only Application ProgrammingInterface (API) or extensible Markup Language (XML) or Web-Services orother formatted network connections are made directly to the serversystem. These outside consumers could be installed on the same serversas the custom application components. The custom server-side engineapplications could be written in different languages, using differentconstructs, foundations, architectural methodologies, storage andprocessing behaviors while retaining the same or similar cumulativelogical function. The UI could be built in different languages, usingdifferent constructs, foundations, architectural methodologies, storageand processing behaviors while retaining the same or similar cumulativelogical function,

FIG. 2 shows a method for consumer generated media reputationmanagement. The TruCast system can be broken down into elements, theelements are, but are not limited to the following: collection,ingestion, analysis, reporting and response.

Collection

In one embodiment, the Collection system gathers the majority ofinformation, about all CGM content online. This is a weighted,prioritized goal because TruCast functions in a weighted, prioritisedway. This -prioritization system is an optionally advantageous elementof the collection, system, called the Collection Manager. The CollectionManager receives input from internal and external sources about whatsites have information of value, weights that information against a setof pre-described and manipupable co-factors to allow tuning, andprioritizes the execution of collection against those sites.

In order to collect data from a blog site, an automated web scriptingand parsing system called a robot is built. An individual “robot” is asophisticated, coordinated script which informs a software engine of howto navigate, parse, and return web information. Every web site iscomprised of code in one of several popular languages, which softwareapplications called web browsers “render” or convert to a visuallyappealing “web site”. A robot, similar to a browser, interprets sitecode to render an output. The desired output is not the “web site” thata browser would create, but an XML document, with columns ofinformation, about the content stored on a given, site. Because robotsare accessing the code, and not the rendered page, they have access tomarkup structures in the code which Identify where specific content ofInterest is stored within the code. Robots use navigation based onDocument Object Model (DOM) trees, regular expression pattern matching,conditional parsing, pre-coded transformations, mathematical and logicalrules, tags, comments, formatting, and probability statistics to extractthe specific content IruCast. In one embodiment, uses from raw web sitecode. Functions which perform this parsing are abstracted and codifiedin the robot engine, which is instructed on specific actions by aspecific robot script to pseudo-code, a robot designed to gather all ofthe blog content on a wordpress site would be scripted thusly: Load XURL, read code until “<bodytext>” is found, return all text until“</bodytext>” is found. If it is found create row 1, store this text incolumn A row 1. Find link with the word “next” in it, follow this link.Read code until “<bodytext>” is found, return all text until“</bodytext>” is found. If it is found create row 2, store this text incolumn A row 2.

This is a clearly incomplete example, as a plurality of robots have theability to gather and transform a very complete set of knowableinformation from every website we visit, including the foil body text,author's name, date of the post, permtalink to the post, title of thepost, it's position on the page, how many comments it has, the fullinformation about those comments, including author, date, order, body,any hyperlinks, graphics, scripts, emotions, or other multimedia filesincluded in a post, comment or site. Robots can be designed to gatherdata from only an individual site, or made more general to accommodatevariation amongst similar sites. Robots parse the gamut non-structuredweb site code into XML encoded text that meets a predefined dataspecification of the design. The system, in one embodiment, collects allposts, all comments, and all desired content from every page that arobot visits.

Robots are not limited to these methods for content parsing,hierarchical temporal memory analysis, probability-based positiveheuristics, and structural inference technologies can be used to makerobots are capable of collecting information from a wider variety ofsites.

Some sites have foil-data RSS or Atom feeds (different than thetypically truncated feeds), for which a specific set of robots exist Wealso have data vendors who deliver full-data feeds in several formats,these feeds are converted to the XML data spec by another class ofrobots. Robots are not limited to web content collection, but representa scriptable system for parsing and transforming incoming and outgoingdata based on pre-defined rules.

FIG. 3 depicts one embodiment of a CGM data collection system. In oneembodiment, the first step of this system is to prioritize possibletargets for collection. Inputs to this prioritization include, but arenot limited to, sites specifically requested by customers (305) and thenumber of responses the system is written to a given site (3.10), thenumber of accounts that find content: from this site relevant (315), thetotal count of relevant content available on the site (320), the date ofthe most recent post written on the site (325) and the historicalperformance of the system at gathering content from this site (330). Thepriority database maintains an updated list of co-factors which arecalculated priorities for each site based on these inputs. When theCollection manager (340) determines that it has excessbandwidth/resources to execute more robots, it polls the -prioritydatabase (335) to determine which robots (345) and then executes them.The collection manager also stores the records of robot activity so thatit can add this information to the priority database (335). Robots, oncelaunched by the Collection Manager, interface with their targets (350)to return XML-formatted CGM content to the Ingestion system (355).

FIG. 4 is a depiction of one embodiment of a CGM site discovery system.Site discovery is the process of finding the URLS of new blog sites onthe Internet. The coordination is performed by the Discovery RobotManager (372). This system retains performance information of the threemethods, and determines what percentage of available resources (cputime, bandwidth) to spend running each of the three methods in order todiscover (he most new URLs possible. The Discovery Robot Managerreceives input from the Discovery Targets DB (370) which stores all ofthe information required to execute each of the three methods, mostnotably the URL targets for each method. This system is fed information,from customer or internal research discovered URLs (362) URLs of knownsearch engines (364) URLs found in the post bodies of CGM content (366)and the URLs of the directory pages for each of the major blog hosts(368). Each method uses this information and a script for webinteraction, called a robot, to discover new CGM URLs. The first methodis called the “Real Estate” method. When the Discovery Robot Manager(372) determines that it is efficient to do so, it will launch a RealEstate robot for a specific search engine (374), and supply it with alist of keywords from all account topics which is held in the DiscoveryTargets DB(370). This robot will visit the search engine and fill m thesearch, form with each keyword, and gather, by way of regular expressionpattern extraction, the URLs of the results from the first 4 pages ofresults. This information will be delivered in XML formal to thede-duplicator (388), which will eliminate known URLs, and then be storedin the Collection Prioritization DB (390) for collection. The second,method, Site Search, is very similar to the Real Estate method, uses thesame robots, but behaves in a different way with different input. TheReal Estate robots use key words from the topics in the accounts. TheSite Search method has a pre-determined list of keyphrases designed tobe representative of the full gamut of discussion on the web. TheDiscovery Robot Manager (372) collects this information from theDiscovery Targets DB (370) and executes a Site Search robot, whichsearches the input keyphrases to retrieve the first 20 pages of results.Because of the much larger number of searches, these robots are designedto heavily obfuscate and avoid patterned interaction with Search Engineservers. The URLs discovered by Site Search robots are delivered to thede-duplicator (388), and from there to the Collection PrioritizationDBC(390), Site Search robots can also alternately be sent input URLsthat, are blog sites instead of search engines. Within this context theywill visit every hyperlink on the site, searching for new links topreviously-unknown sites. This he delivered as new URL output similar tothe other methods. The third method, called Host Crawl, uses differentrobots to visit the directory listing pages on major CGM hostingengines. These directory pages URLs are stored in the Discovery TargetsDB(370). The Discovery Robot Manager (372) launches a Host Crawl. Robot(376) which visits a CGM Host directory page (382) and visits all of thehyperlinks on that page retrieving all of the URLs that are available.This information is sent to the de-duplicator (388) and on to theCollection Prioritization DB (390).

Ingestion

FIG. 5 depicts one embodiment of a data Ingestion system. This systemreceives input from the XML data outputs of robots launched andadministered by the Collection Manager (800). These XML data sources arequeued in an Ingestion Queue (805). This queuing process is a bufferingfunction because all of the remaining steps are a stream processingmethod which requires a steady stream of content to work at maximumefficiency. Due to the dynamic nature of the volume of XML data input,the Ingestion queue holds a backlog of incoming data and outputs it at asteady rate, currently 500 does/second. This flow of data is deliveredfirst to a system which compares Incoming CGM content information to allpreviously collected, content based on posted date, permalmk URL, andpost body to ensure that the data does not already exist in the system.This is the de-duplicator (810). Once this system has culled duplicatedocuments, it hands those documents to a UREF constructor (815) whichcreates a new uniqueID number to easily index and track unique contentwithin the system in one embodiment of the invention. Next, content isdelivered, to a GMT time aligner, which converts all date and timestamps to be relative to Greenwich Mean Time (820). Next, this XMLformat information is transformed using an XSLT (825) or extensibleStyle Language Transformation processor, which reformats the data forrapid delivery into the indexing system, and relational DB systems(830). In one embodiment, TruCast performs several cleaning and refiningsteps upon incoming CGM content enclosed in the XML format The systemeliminates duplicate content using a fuzzy logic comparison betweenexisting stored content and incoming new content based on post body,permalink, and date information. This comparison is tunable andweighted, where positive matches are clear indicators of duplication,but agreement is required across multiple values to confirm duplication.For example, if two posts came from exactly the same date and time tothe second, it's unlikely, but possible, that they are truly differentunique posts. If, however, the body text is 90% the same, and. the URLis 90% the same, it's extremely unlikely that the two posts are unique.On body text, this comparison includes text clustering analysis, to useword counts as a computationally inexpensive way to further evaluateuniqueness. Content that is malformed or incomplete according to thedata, spec is removed and warnings sent to the responsible collectionmanager element. Once a document Is determined to be unique a UREF(unique reference) value is created and appended to it so that there isa relevant single value to index this information within the system. Allincoming post dates are aligned to GMT. In one embodiment, TruCastdelivers all prepared content into an indexing system which formats thedata in such a way that it can be rapidly searched based onrelationships to other data, keyword presence, account relevance, anddate. This structure includes storage of data within a distributedindexed data repository as well as several SQL databases. Each SQLdatabase is optimized for a different consuming system: the UI, thevisualization systems, the reporting and statistics systems, thecollection priority database, and the target discovery database, as wellas the individual account level data stores.

Analysis

In one embodiment, TruCast is designed to determine, with a high degreeof confidence, the conceptual relevance of a given piece of CGM contentto a “topic” or concept space. Topics can be of any breadth (“War” isjust as sufficient a topic as “2002 Chevy Silverado Extended Cab DoorHinge Bolt Rust”). Topics are abstract identifiers of relevanceinformation about a given piece of CGM content. Each topic can also beunderstood as a list of “keyphrases” or keywords with Boolean modifiers.Each topic can contain an unlimited number of keyphrases that work asthe first tier of pattern matching to identify content that is relevantto an individual account. Each post discovered by the system, and, inone embodiment, could be relevant to one topic, many topics, many topicsacross many accounts, or no topics at all.

FIG. 6 depicts one embodiment: of a system for manually appending topicrelevance and topical sentiment to blog posts. This process begins bydiscovery of potentially relevant content by way of keyphrases.Keyphrases are grouped, into topics. Topics and keyphrases and createdby users (405) in the Topic Manager panel (410) within, the UI. Once anew topic and keyphrase is created, this information Is transmitted tothe indexing system (415) which begins to examine all incoming data, formatches against, this keyphrase. The information is also handed to therelational database system. (420) which is also the StoleDB component ofthe Historical Data Processor as illustrated in FIG. 38. This systemexamines all data that has already been processed to see if it matchesthis new keyphrase. This separation accelerates bosh processes becauseof optimized structure in (415) for stream processing and optimisedstructure in (420) for narrow, deep searches against a significantlylarger database. Information from both of these systems are passed inqueue form to the Scoring Manager (425) which provides a UI for users toannotate topic relevance and topic sentiment information which is storedin the relational DB (435). In one embodiment, TruCast contains a userinterface that allows users to create topics, create keyphrases that areused to search for potentially relevant posts for that topic, placepotentially relevant content into a queue for review, review the textand context of individual content, mark that content as relevant tonone, one, or many topics, (thereby capturing human judgment ofrelevance), and store that information in the relational database. Thissystem is called the Scoring Manager.

This method, where a post is matched by keyphrase, scored by humans, anddelivered to the outputs of TruCast, in one embodiment (visualizations,reports, and response), is the most basic “manual” behavior of thesystem.

The behavior of this tiered system of relevance discovery and analysischanges over time to reflect the maturation of the more sophisticatedelements of the system as their contextual requirements are much higher.A keyphrase match is absolute; if a post contains an appropriatekeyphrase, there is no question as to if a match exists. The ConceptualCategorization system is built to apply a series of exemplar-basedprediction algorithms to determine the conceptual relevance of a givenpost independent of exact keyphrase match. This makes the system, in oneembodiment, more robust and provides more human-relevant information. Inan exemplary embodiment a blog post body includes the following text: “Ireally enjoy looking out my windows to see the vista out in front of myhouse, Buena! it is so great! I wish my computer was so nice, it is alittle broken edgy eft sadly.” (EX.1)

A topic for the Microsoft Corporation, looking for the words “windowsvista computer” in order to find online discussion about their newoperating system would find this post, by keyphrase snatch, despite thefact that the user discusses using “edgy eft” which is a code name forUbantu 6.08, a competitor's operating system. A topic for MilgardWindows and Doors Corporation that is looking for discussion aboutwindows in need of repair would find, this same post, looking for thekeyphrase “broken house windows” despite the feet that clearly thewriter is enjoying looking out of his unbroken windows. The DisneyCorporation, looking for discussion, about their film company “BuenaVista” would find this post, which has nothing to do with them at all. Abiologist researcher looking for references to immature red newts wouldsearch for “Eft” only to be sadly disappointed in another result aboutUbantu's software. In all of these cases keyphrase matches have proveninsufficient to successfully match relevant content to interestedparties. Boolean modifiers help (vista NOT Buena) but consistently fellfar short of expectations, and require non-intuitive and time consumingresearch and expertise,

Automated Conceptual Categorization

FIGS. 7-9 show the smooth transition between user scoring and automatedscoring and depict the progression of the operation of one embodimentfor an automated categorization and sentiment analysis system. Thisprogression occurs from the early state, where the automated systemperforms poorly due to a lack of contextual examples, to a mature statewhere the automated system performs excellently as a result of robustcontextual examples. The system, in one embodiment, reacts to thisimprovement by reducing the rate of post queue delivery to users andIncreasing the acceptance of analyzed posts from the automated system asconfidence ratings and exemplar set sizes increase. This process acceptsinput from the ingestion system (350) into two separate queues. Thefirst queue delivers contest to the scoring manager (610) where it isscored by humans (615) and then delivered to the per-topic exemplar sets(620) based on topic relevance, the relational database (625) forstorage and use in the response, visualization and report sections, andto an agreement analysis system (645). A second queue delivers contentto the automated categorization system, which accepts input from theper-topic exemplar sets, as well as topic performance and tuninginformation from the agreement analysis system (645). This system passesconceptually relevant content to the sentiment analysis systems whichalso has access to the exemplar and agreement analysis tuning data. Theautomated systems append a “confidence” score to their evaluations,which are used as a threshold to decide trust hi the evaluation'saccuracy. In the early behavior of the system, due to the lack ofexamples and agreement analysis tuning data, often this confidence scoreis very low. As more manual scoring is completed, and agreement,analysis improves, the percentage of data flowing into the automatedsystems increases, and once performance is proven on the full datastream, the flow of data to the manual scoring application begins todecrease. Continual tracking of the agreement analysis system tracks forthe varying level of inaccuracy that die automated systems can create asa result of changes within topical vernacular, user vocabulary, or newcommon phrases, inflections, or other changes in the typical word,patterns present in incoming CGM content are reflected by the dynamicadjustment of the percentages of data flowing into these two systems.Over time, given sufficient, accurate scoring by humans, the automatedsystems should be capable of accurate analysis on 100% of incomingdocuments, which would reduce the role of required human interaction toonly providing audit and contemporary vernacular updates by way ofminimal scoring. In one embodiment, TruCast, contains a ConceptualCategorization system which has functionality to evaluate posts forrelevance by way of statistical analysis on examples provided by humansusing the scoring system. Because humans are reviewing the content, froma specific customer's perspective, that content is reliably scored incontext. If the above example post (EX.1) was scored by a human scorerfor Microsoft, it would he found irrelevant to the Windows Vistaoperating system. By statistical analysis of hundreds of posts markedrelevant or irrelevant to individual topics, the system can utilize notjust keywords, but the entire body of the post to determine relevance.This statistics calculation leverages text clustering assisted by stopwords exclusion, noun and pronoun weighting, punctuation observation,and stemming near-word evaluations. For non-text categorizationanalysis, TruCast, in one embodiment can leverage Optical CharacterRecognition (OCR) image to text conversion, Fast-Fourier Transform (FFF)and. Granular Synthesis (GS) analysis based speech-to-text conversion,as well as Hierarchical Temporal Memory (HTM) processing. Thiscomparison, and the resultant threshold filtered probability that agiven post is relevant to a given topic allows TruCast, in oneembodiment, to assign this meta-information. This method is vastly moreaccurate to human analysis than keyphrase matching. It also has theoptionally advantageous feature of being continually tuned by ongoingscoring within the UI, which provides fresh exemplar data over time.

Automated Sentiment Analysis

When users score content for relevance in the scoring manager, they alsomay assert the sentiment of the content for each, topic that it isrelevant, from the perspective of their account. Users will mark, fromtheir perspective (as informed, by a set of scoring rules described byuser administrators) the sentiment reflected about each topic. Thisinformation will be stored for later use in a relational database,

These human markup actions serve two purposes. First is to capture thisdata for direct use within a response system, and a series of datavisualizations that leverage topic and sentiment information toelucidate non-obvious information about the content TruCast collects, inone embodiment. This is the “manual” path for data to flow thru thesystem, in one embodiment. The second use for these posts is that theyserve as example data for an exemplar driven automated sentimentanalysis system that, mirrors the conceptual categorization, system.

Similar to the process of categorization, the system, in one embodiment,leverages an exemplar set of documents to perform an automatedalgorithmic comparison in order to determine the sentiment, per topic,contained within an individual post This requires a larger number ofexamples than categorization analysis, (˜100 per sentiment value pertopic) due to the four different stored sentiment values, “good”, “bad”,“neutral” and “good/bad”. Due to the significant complexity of sentimentlanguage within human language, additional processing is performed uponeach document to improve the accuracy of the analysis. A lexicon ofsentimental terms is stored within the system, and their presence has aweighted, impact on the analysis. Negation terms and phrase structuresalso alter the values associated with sentimental terms. A slop wordslist eliminate connective terms, object nouns, and other non-sentimentalterms within the text, reducing the noise the comparison has to filterthru. Sentence detection uses linguistic analysis to subdivide postsinto smaller sections for individual analysis. A series of algorithmsare compared for accuracy and performance on a per topic basis, to allowthe performance of the analysis system to be tuned, to each topic.

Automated Analysis Management

Both of these processes work upon the posomgestion content, directingautomatically analyzed documents into the remainder of the systemworkflow. This process reacts to the number of exemplar documents thatare available. If incoming content: is keyphase-relevant to a specifictopic, a determination is made if sufficient exemplar documents havebeen gathered by the system from users. If enough exemplary documentsare not available, that post is delivered to the scoring queue whichfeeds content to the scoring manager interface. If some documents arepresent as exemplars, the system will attempt automated categorizationand sentiment analysis, but still deliver posts to the scoring manager.This creates a pair of analysis results, one from, the computer and onefrom the user. These are compared, and when a sufficient alignment(agreement frequency) is reached, the system starts deliveringauto-analyzed content directly to the reporting and response systems,saving human effort.

This is a sliding ratio from 100% being delivered to the UI and 0% beingauto-analyzed, to only 1-10% being delivered to the UI and 100% beingauto-analyzed. Once the ratio of content being reviewed by human scorersreaches 10%, and accurate performance of the automated analysis ismaintained, mature operation of the automated systems has been achieved.This is the most efficient operation of the system, in one embodiment.

The system utilizes an aging and auditing system to ensure that theoldest human scored posts are ejected from the exemplar set and replacedby new human scored posts over time. The system also performs internalcluster analysis and ejects significant outliers from the system. Bothof these processes are tunable by administrative control panels. Theresult of this aging and auditing should be that as the vernacular, wordusage, and issues discussed internal to a given topic change over time,exemplar documents continue to reflect, that change and accurately maprelevance.

Reporting

The system, in one embodiment, of databases which receive topicrelevant, analyzed content is connected to a series of web-basedvisualizations to allow users of the UI to understand valuableinformation about the discussions captured by the system, in oneembodiment, Visualizations are shown in FIGS. 27-38.

Response

FIG. 10 is a depiction of one embodiment of a CGM response engine. Inthis embodiment the Response Manager UI (752) is populated with awritten response by a user (758). This user is evaluated forauthorization permissions against a stored value in the Account Database(754). If the user does not have appropriate authorization, theirresponse will be delivered to an authorization quene(756) to be approvedby an administrator, if a response is not approved, it is deleted. If aresponder has authorization, or their response is approved, it will bedelivered to the Response Priority Processor (760) which determines ifany delay or promotion is required for a given approved post. It alsoobserves the original posted, date of the content that is beingresponded to and prioritizes based on moss recent posted dates. TheResponse Engine Manager (764) requests, responses from the ResponsePriority Processor(760) to deliver to the registration and responserobots. The Response engine manager checks the response performance DB(766) to see if a given URL has a response robot that has already beencreated or not. If it has not, the response and all associatedinformation, is sent to the Response Robot Constructor (772). This toolprovides an interactive 01 to allow semi-automated interaction with atarget. CGM site's registration and response systems to deliver theresponse to the site, and record the interaction. These interactionsinclude loading pages, following hyperlinks, assigning input data tosite form fields, navigating to web mail systems for authenticationmessages, completing CAPTCHA tests, interacting with IM and SMS systems,performing sequential interactions in correct order and submittingforms. The result of these actions should be a newly registered user (ifrequired by the site) and a response written to the blog site. Theinteraction is recorded and stored in the Registration and ResponseRobot sets (770, 774). If, when the Response Engine Manager is sent aresponse, it determines that a robot already exists, it will executethat robot without human interaction. This has the same effect, creatinga new registration if required, and writing the response to the CGMsite. Success or failure of robots and robot constructor actions arerecorded in the Response Performance DB for evaluation and manual codere-work if required.

The response manager is a system to convert into a manageable, scalablebusiness process the task of responding to CGM content by way ofcomments. All CGM systems that allow interactivity (>90%) have a webbased system for allowing readers of content to respond by way of acomment, note, or other stored message. This often requires that usersregister themselves on the site, by providing a ussruanie, password, andother personal details. Sometimes this requires providing an e-mailaddress, to winch an activation link is sent, or an instant messengeraccount which is sent a password. This isn't too difficult for casualusers to maintain, especially if they only interact with a few sites.Professional users however often have to interact with thousands ofdifferent sites. The system, in one embodiment, aims to reduce thisworkload for responders by automating the registration and responseprocess,

Respouse Workflow

In one embodiment, the TruCast UI system facilitates a workflow for manyusers to interact in a coordinated, managed way with CGM content. Once apost as been successfully analysed by either a user in the scoringmanager, or the automated analysis systems it becomes available withinthe response manager. This is a UI system for a user to write a commentin response to relevant posts. The UI two halves, one which showsinformation about the post being responded to (author, date, body text,and other comments from within the thread, as well as stats about theauthor and site responsible for the content.), and the second thatcontains the new response the user is writing. The system provides aninterface called the response vault for managers to pre-write messagecomponents, fragments of text, names, stats, and pieces of argument thatthey'd like responders to focus on. These snippets can be copied intothe response body during authoring. Once a user is done writing aresponse, the can click a “send” button which delivers the newly writtenresponse to the relational database.

Response Automation

FIGS. 11-13 show screen shots of a registration, and response feature.Once the system, in one embodiment, receives a response record, from theresponse manager, it determines which blog site contains the originalmessage, and the link, to the response page for that site and message.If the system, in one embodiment, has never written a response to thatsite before, the record is delivered to fee response interactor UI orResponse Robot Constructor, which is run by company employees. This UIallows an employee to visit the appropriate site, navigate to theappropriate fields, and assign the information from the record to fieldson the site that will cause the site to record a response. This actionis recorded, and converted into a script, which is stored as a new robotfor later re-use if TruCast has already written a response to a givensite, this script will be used eliminating the need for repeated humaninteraction.

This system utilizes a similar engine and scripting methodology as thecollection system. Registration and Response robots are scriptedautomations, which interpret the code of CGM content pages, web pages,pop3 or web based e-mail systems, and other data structures, and performpredetermined, probabilistic, or rule driven interactions with thosestructures. By interpreting page code and scripted instructions, theycan imitate the actions of human users of these structures, by executingon screen navigation functions, inserting data, gathering data, andreporting success or failure. An example registration robot would begiven as a data input the registration information for an individualuser of the system, in one embodiment, and given the URL to a site thatthe user wishes to register on. The robot would visit the site, navigateby markers pre-identified in fee page code to fee appropriate fontslocations to insert this information, confirm it's insertion, and reportsuccess, as well as any output information from the site. An exampleresponse robot would accept as input the registration information for agiven user of the system, in one embodiment, the blog response they'vewritten, and the URL to the site that the user wishes to respond to. Therobot would load, the site into memory, navigate fee page by way ofhyperlinks or pre-determined, probabilistic or rule driven information.,examine the page source code to discover the appropriate form, fields toinsert this input data into, do so, and report success. Otherembodiments of this solution could include purpose built scripts thatperform the same assignment and scripted interaction with CGM sites toperform registration and response tasks. Smaller scale systems wouldhave users perform the manual field entry and navigation tasks, butcaptures these interactions for conversation involvement identificationand maintenance by the analysis systems.

Once fee system, in one embodiment, receives a response record from, theresponse manager, it determines which blog site contains the originalmessage, and the link to the response page for that site and message. Ifthe system, in one embodiment, has never written a response to that sitebefore, the record is deli vered to the response intexactor ill, whichis run by company employees. This UI allows an employee to visit feeappropriate site, navigate to fee appropriate fields, and assign theinformation from the record to fields on the site that will cause thesite to record a response. This action is recorded, and converted into ascript for later re-use. If TruCast has already written a response to agiven site, tills script will be used eliminating the need for repealedhuman in teraction.

This system utilities a similar engine and scripting methodology as thecollection system. Other embodiments of this solution could includepurpose built scripts that perform the same assignment and scriptedinteraction with CGM sites to perform registration and response tasks.Smaller scale systems would have users perform the manual field entryand navigation tasks, but captures these interactions for conversationinvolvement identification and maintenance by the analysis systems.

There are several sophisticated systems for preventing automatedinteraction with registration and response forms on CGM. sites. BecauseTruCast is engine and script driven, and each transaction happens by wayof a modular execution system, we can tie the process to outside supportmodules to defeat these automation prevention systems. The responseautomation system has a complete pop3 e-mail interaction system whichcan generate e-mail addresses for use In registration, check thoseaddresses for incoming mail, and navigate the mail, content as easily asmore typical web content. The response automation system uses advancedOCR processing along with human tuning to defeat CAPTCTA protections.The system has access to jabber protocol interactions to createautomated IM accounts and interact by SMS with mobile phone systems,TruCast also stores a significant body of information, in contact cardformat, about responders so more complex registration questions can becorrectly answered.

Conversation

The response system within TruCast delivers posts to blog sites, whichare the target for the collection system. As the system, in oneembodiment, collects content it matches incoming content to evaluate ifthat content belongs to a thread that the system has interacted with.When the system discovers posts that were written after a response thatTruCast wrote, it is returned to the queue of posts assigned to the userwho wrote the response, with a maximum priority. This way aconversation, can be facilitated. We also allow review of conversationsby way of an Audit Panel, which gives a timeline of interaction for aconversation between a blogger and a TruCast user.

Transparency

Given the volatility of the CGM space, the value it represents, and thedanger of negative publicity for any companies or other interestedparties who choose to interact by way of responding by comment, it isvery important to maintain the appearance of correct attribution. Theusers are responsible for the content they generate. Because of thesophisticated analysis tools available for CGM site owners to evaluatethe source of incoming comments, it's important that the system, in oneembodiment, correctly portrays correct attribution. While using theTruCast system to automate response delivery to blog sites, correctattribution of cement origination is retained,

Indicators of origination include: (1) E-mail address used inregistration/response process; (2) Owner of e-mail address domain's asreported by the WMOLS information; (3) Receipt of e-mail sent to thisaddress by the correct customer to the system, in one embodiment; (4) IPAddress used in the response/registration process; (5) Reverse DNSlookup on fee IP Address used in fee response/registration process, andthe resultant WHOIS information; and/or (6) Internal consistency of bloguser registration information.

Any given customer or user will direct a domain name that's appropriatefor blog post response, connect this domain (and its MX record) to webaccessible server. This server should make available the e-mailaddresses hosted on it via a pop3 connection. This resolves issues 1 and2 by placing ownership of the domain from which the e-mails forregistration are generated, into the hands of the users.

A forwarding system between e-mail addresses created by a robot and thee-mail address listed in the User Manager exists. Forwarding messagesfrom this TruCast controlled site to the customer's e-mail ensures thatcustomers receive any messages from bloggers that reply by e-mail. Thisresolves issue 3.

The Response Automation tool receives port 80 from the IP address usedfor die e-mail server installation, and the server hosts the ResponseAutomation Engine for use in executing the scripting that is created toperform automated response. This resolves issues 4 and 5 by aligning theIP source of the comments with the e-mail source of the comments,

The tool collects significantly more information about responders thanis typically necessary. This includes obscure information like birthdate, favorite car, mother's maiden name, favorite popsicle flavor, userpicture, etc, to ensure that registrations are complete, feature rich,and transparent. The manual response app and robots accept this data inthe response and registration steps. This resolves issue 6.

By way of this unified approach to transparency, attribution accuracyshould always he retained.

If customers or other users desire misattribution of message source, IPand e-mail anonyrnization features can be enabled. This obfuscates thesource of output, messages by way of a rotating IP proxy environmentwhich leverages P2P and onion topologies for maximum opacity.

Administration

This valuable to keep blog-focused workers on message, sayingappropriate things, making persuasive arguments, and being considerateparticipants in the community. In order to facilitate this, the system,in one embodiment, has a set of authorization features. Administratorshave access to a per-user toggle which forces the posts that users writeto be delivered to a review queue instead of the response automationsystem when they press the “send” button. This queue is accessible byadministrators to allow review, editing, or rejection before messagesare submitted.

Administrators can also create and manipulate sorting rules whichprioritize content within user scoring and response queues based, ontopic, site, engine, author, and date information. This forces users towork, son appropriate content, and allows administrators to segmentscoring and responding tasks to SME's who have the most context for agiven topic, site, engine or author.

Accounts

Users in the system, in one embodiment, are members of accounts, andafforded permissions within the system based on the role assigned tothem by administrative users on a per account basis. Roles are pre-boundpermission sets. Administrators can create, edit, and delete everythingwithin the system, except accounts. Group administrators, who haveaccess to multiple accounts, can create accounts, and can edit anddelete accounts that they've crested or been given access to. Systemadministrators can add, edit, and delete ail accounts, so thispermission, role Is reserved for internal support use only. Users withinthe system, in one embodiment, are intended to perform the majority ofthe scoring and responding work, and as such have only access to thescoring manager, response manager, and their own user manager to reviewtheir own performance. Group users can do these tasks for multipleassigned accounts. Viewers within the system, in one embodiment, haveread only access to all 01 controls. Group Viewers can review multipleaccounts. Accounts as a whole can be enabled or disabled, which blocksusers from accessing the system if their account is disabled, and stopsany account specific collection, analysis or processing tasks.

FIG. 15 shows an example screenshot of the user interface homepage 1300.The homepage 1300 enables a user to navigate through the differentfunctions of the UI. The toolbar is located at the bottom of the screenand features two menus (account menu and control panel) and a row ofeight icons: Account Manager 1305, User Manager 1310, Topic Manager1315, Sorting 1320, Scoring Manager 1325, Response Manager 1330,Dashboards 1335, and Reporting 1340. The account manager 1305 is used tocreate/set-up accounts and deactivate/reactivate accounts. The usermanager 1310 is used to set-up/create users, establish group rights andpermissions, and to review user activity. The topic manager 1315 is usedto set-up/create topics and to set-up/create key phrases. Sorting, 1320,is used to set-up/create scoring and responding rules for a topic, site,author, engine, and/or date and assign rules to a specific user. Thescoring manager 1325 is used to read/score posts and create new topicswhile scoring a post. The response manager 3330 respond to posts in nearreal time and create/save personas and pre-determined responses.Dashboards 1335 is used to map and graph, sentiment, impact, authorityand data. Reporting, 1340 is used to display statistical charts. Finallya control panel 1345 is used, to log out of TruCast and allows email tobe sent directly to user support.

FIG. 16 shows an example screenshot of the account manager 1305. Theaccount manager is accessed by a user through button 1305 in FIG. 15.The account manager 1305 creates and manages accounts in TruCast,Accounts serve as the logical groups of related users, topics, and othersystem elements. This creation action establishes a new GUID identifiedaccountID that is used by the backend systems to identify data pertinentto this account Account is often synonymous with customer for TruCast.

FIG. 17 shows an example screenshot of the user manager 1310. The usermanager 1310 allows administrators to set-up users, to assign specificrights/permissions to them and to evaluate then activity in TruCast.This is how a work team is created to address a specific target issuewithin the CGM space. Each new user is assigned a userID value to tracktheir activities, and identify their actions at the database level,enforce permissions and limit access. All users who login to TruCastalready have a user ID. The response authorization required flagdetermines if a user's responses need to be approved by an administratorvia the authorization system,

FIG. 18 shows an example screenshot of a topic manager 1315, FIG. 19shows an example screenshot of the topic manager 1315 with theKeyphrases tab activated. The Topic Manager 1315 is where administratorsdefine topic titles, create topic descriptions, determine key phrases,and exclude specific phrases from the assigned topic. This willdetermine the content that is matched by the keyphrase tier of relevanceanalysis in TruCast. Topics are also analysis points, so they're usedlater to compare and contrast in the visualization systems. Each topicand key phrase has a GUID value distinguishing it within the databasesystems.

FIG. 20 shows an example screenshot of a sorting manager 1320, FIG. 21shows an example screenshot of the sorting manager with the users tabactivated, Sorting 1320 enables administrators to define scoring andresponding guidelines. Administrators can create rules that eitherimpacts all users or a specific user's scoring or responding queue willbe sorted. These sorts impact the queue by matching, so all posts thatmatch the rule are sorted to the top of the queue, which allows users toscore items that are of general importance after completing scoring theposts that were specifically assigned to them by an administrator.

FIG. 22 shows an example screenshot of a scoring manager 1325. Theanalysis system, having determined that a post matching either keyphraseor conceptual categorization, filtered by the sorting system, deliversposts in a sequential queue to the scoring system. Scoring is thecentral method for users to impact the function of the automatedsystems, providing examples and context for their operation and it's theshortest path for a post to make it from ingestion to visualizations andresponse. The post is placed in text box 2005, the topics that the postrelates to are in box 2010, which a user will rate using the radiobuttons presented. Finally the site information related to the post isplaced in box 2015.

FIG. 23 shows an example screenshot of creating a new topic 2110 in thescoring manager 1325. Because pre-determined topics may not cover thescope or issues that exist in the discussion discovered by TruCast,TruCast allows scoring users to create topics, in the new topic text box21 Hi, on the fly to capture the observation that a new loci ofdiscussion exists. These topics are not populated with feyphrases atthis step. Instead, administrators have the capability to merge anddelete topics from the Topic manager to ensure that all the team memberswho may have simultaneously discovered this new topic can receivedirection from the administrator as to what the final topic title willbe, and instructions by way of descriptions and scoring rules about howto interpret it.

FIG. 24 shows an example screenshot of a response manager 1330. Theoutput from the analysis system and the scoring manager 1325 feed intothe response manager 1330 based on applicable sorting rules as assignedby administrators. Writing the response, in block 2210, and clicking“post” is all that's required to ensure that the message you typed,makes it out as a comment on the target site. Your writing process issupported by significant contextual information, from the topicrelevance and sentiment score information to stats about the originalauthor and the site they posted on. Once you submit one response, thenext item lot your review is available immediately in the same panel, noneed to navigate to other pages or sites to find the next place tocommunicate.

FIG. 25 shows an example screenshot of an administrative queue 2300. Theadministrative queue tools allow administrators to exercise control overUser response activities. These queues can be used, for managerialoversight, legal review, tactical analysis, training, feedback andperformance auditing. They create the framework for administrativeauthority over the response process.

FIG. 26 is an example screenshot of a dashboard manager 1335. TheDashboard displays data, in dynamic graphical charts and graphs, if mapsand reports information based on impact, sentiment, authority and data.This allows users to easily identify critical issues, compare topics ofdiscussion for volume, breadth, depth, tone and interconnectedness ofCGM discussions, as well as other useful insights about the CGM space.

FIG. 27 is an example screenshot of an Impact Dashboard 2500 and itrefers to a set of three line graphs which show daily totals over timethat depicts the breadth, depth, and participation of the discussionscontained within one or many topics. This information is combined with apolar chart that shows the combined values of the three graphs for oneperiod.

FIG. 28 is an example screenshot of a Sentiment Dashboard: refers to asnapshot view of a single period, showing the relative post volumeversus the average sentiment of your selected topics, FIG. 29 is anexample screenshot of a Sentiment History Dashboard. This display isconnected Co a history view which display s this information over time.

FIG. 30 is an example screenshot of an Authority Map Dashboard: refersto a node and edge style interactive display which shows theinterconnectedness and relative authority of individual authors within,a given topic. It shows topic as the center node, sites that containrelevant content as first edge nodes, and authors as second edge nodes.Edges between authors connote comments, links, quotes, and trackbacks asmethods of identifying connection and communication. A list view on theright side of the screen allows you to quickly find specific authors orsites within the display. Adjustable level of depth controls allow usersso establish constraints (show only authors with more than 2 links, showonly positive authors, etc.) that effect the visibility of nodes in thedisplay.

FIG. 31 is an example screenshot of a Data Dashboard: refers to adisplay that shows a tabular result set of posts that matched the topicsselected. This table shows one post per row with columns for date,author name, permaline, site name, sentiment, and topic. This view canshow only information based, on keyphrase-relevance, or full analyzed,relevance, or show those two together. In several, other dashboardsthere are links to more information about a given topic or author. Thoselinks point to this display.

FIG. 32 is an example screenshot of an Ecosystem Map: refers to anEcosystem map is a node and edge style display of all of the sites thatmake up the discussion ecosystem for a given topic or topics. It shows anode for each site that commas posts or comments relevant to the topicsselected and date ranges selected in the dashboard launcher panelBetween nodes, there should be an edge for each link that connects nodestogether.

FIGS. 31-32 show example screenshots of Nodes and are size scaleddepending on how many posts/relevant posts they have, and colored byaverage sentiment. Edges are thicker depending on how many links existbetween two nodes, and have size scaled arrows showing the predominantdirection or ratio of links. Nodes, if clicked on should show the sitename, # of posts total, # of relevant posts, and sentiment %. The nameis a hyperlink to the site. By selecting an individual topic a moredetailed display with the sites and authors most important to a giventopic displayed. Double click on the node would, lead to the datadashboard with a list of all the titles and permalinks to the relevantposts on that site. Edges, if clicked on, show the # of linksrepresented, % directionality.

FIG. 35 is an example screenshot of a Sentiment Summary: refers to asingle topic display that shows the number of authors per sentimentalcategory on a given topic or sum of topics.

FIG. 36 is an example screenshot of Top Lists: This provides users witha set of ranked lists of sites, authors, and posts that are the mostrelevant, most popular, most negative, most positive, mostauthoritative, most influential, most linked to, most commented on ormost responded to depending on user selection.

FIG. 37 is an example screenshot of a Reporting: The repotting systemprovides a series of charts based on selection criteria revolving aroundCGM content Daily or total values of posts by keyphrase match orpost-analysis match, per topic or topics, site, author, by date range.Performance metrics on scorers and respondent are also available, persite, topic, or date range.

FIG. 38 is an example screenshot of an Aggregate Performance Dashboard:This dashboard supplies a cluster of configurable widgets for trackingthe relationships between several KPI's associated with the dataavailable within TruCast, in one embodiment,

FIG. 39 is a graphical user interface 3900 generated, in accordance withan embodiment The interface 3900 includes a view selection menu 3905, afilter selection menu 3910 and a timeframe selector 3915. As illustratedin FIG. 39, selection from the view menu 3905 of the “volume” setting,the timeframe selector 3915 of the “7 days” setting, and the filter menu3910 of the “trend” setting yields a graph 3920 of the number of times auser-selected, search term/phrase has been mentioned, as determined by asystem according to an embodiment described herein, over a seven-dayspan.

FIG. 40 is an exemplary screenshot of the graphical user interface 3900according to an embodiment. As illustrated in FIG. 40, selection fromthe view menu 3905 of the “Volume” setting, the timeframe selector 3915of the “7 days” setting, and the filter menu 3910 of the “geo” (i.e.,geographical illustration) setting yields a graphical illustration 4000indicating in color coding, in the illustrated embodiment, the number oftimes a user-selected search term/phrase has been mentioned, asdetermined, by a system according to an embodiment described herein,over a seven-day span according to pre-defined geographic regions. Atextual table 4010 indicating the number of times a user-selected searchterm/phrase has been mentioned over a seven-day span according topre-defined geographic regions may also be generated.

FIG. 41 is an exemplary screenshot of the graphical user interface 3900according to an embodiment. As illustrated, in FIG. 41, selection fromthe view menu 3905 of the “volume” setting, the timeframe selector 3915of the “7 days” setting, and the filter menu 391.0 of the “period”setting yields a bar graph 4100 illustrating a comparison of the numberof times a user-selected search term/phrase has been mentioned, asdetermined by a system according to an embodiment described herein, overtwo different user-selected seven-day spans.

FIG. 42 is an exemplary screenshot of the graphical user interlace 3900according to an embodiment. As illustrated in FIG. 42, selection fromthe view menu 3905 of the “volume” setting, the timeframe selector 3915of the “7 days” setting, and die filter menu 3910 of the “media type”setting yields a bar graph 4200 illustrating a comparison of the numberof times a user-selected search term/phrase has been mentioned, asdetermined by a system according to an embodiment described herein,according to the type of pre-defined social or other media o ver aseven-day span.

FIG. 43 is an exemplary screenshot of the graphical user interlace 3900according to an embodiment. As illustrated in FIG. 43, selection fromthe view menu 3905 of the “sentiment” setting, the timeframe selector3915 of the “7 days” setting, and the filter menu 3910 of the “trend”setting yields a graph 4300 illustrating a comparison, over auser-selected seven-day span, of the number of times a user-selectedsearch term/phrase has been mentioned, m determined by a systemaccording to an embodiment described herein, according to the sentimentexpressed in connection with the search term/phrase.

FIG. 44 is an exemplary screenshot of the graphical user interface 3900according to an embodiment. As illustrated, in FIG. 44, selection fromthe view menu 3905 of the “list” setting, the timeframe selector 3915 ofthe “7 days” setting, and the filter menu 3910 of the “content detail”setting yields a user-selected, message 4400 of a plurality of messages,posted within a user-selected seven-day span, concerning a user-selectedsearch term/phrase, in an embodiment, the message 4400, as well as anyother user-selected message of the plurality, is displayed to include anactions menu 4410.

Referring to FIG. 45, selection, of the “email” option from the actionsmenu 4410 causes an email application associated with the computingdevice by which the interface 3900 is generated to Invoke an emailmessage window 4500. Window 4500 enables the user to email the contentsof message 4400 to the recipient of his/her choice.

Referring to FIG. 46, selection of the “tags” option from the actionsmenu 4410 invokes a tag menu 4600. Selection by the user of an optionfrom the tag menu 4600 enables the user to label message 4400 in amanner designating the message as belonging to a particular pre-definedcategory or otherwise as requiring some type of follow-up action. Forexample, referring to FIG. 47, selection of the “opportunity” optionfrom the tag menu 4600 causes a corresponding label 4700 to he displayedwithin message 4400.

Referring to FIG. 48, selection of the “promote” option from the actionsmenu 4410 invokes a promote menu 4800, Selection by die user of anoption from the promote menu 4800 enables the user to post/tweet message4400 in accordance with a social networking or other platformcorresponding to the selected promote-menu option.

FIG. 49 is as exemplary screenshot of the graphical user interface 3900according to an embodiment. In the example illustrated in FIG. 49,interface 3900 includes a comparison menu 4905 from which a user canselect one or more search terms/phrases. Selection of an option from thecomparison menu 4905 enables the simultaneous graphical illustration ofcontent data pertaining to multiple search terms/phrases. As illustratedin FIG. 49, for example, selection from the view menu 3905 of the“volume” setting, the timeframe selector 3915 of the “7 days” setting,the filter menu 3910 of the “trend” setting, and the comparison, menu4905 of the “Lowe's” setting yields a graph 4900 illustrating acomparison, over a seven-day span, of the number of times eachuser-selected search term/phrase has been mentioned, as determined by asystem according to an embodiment described herein. This simultaneousillustration of data pertaining to multiple search terms/phrases may heconfigured by interface 3900 in a manner similar to that discussed,above herein with regard, to FIGS. 39-48 in connection with singularsearch terms/phrases.

FIG. 50 is an exemplary screenshot of a dashboard 5000 according to anembodiment of the invention. As illustrated In FIG. 50, dashboard 5000may comprise a conglomeration of multiple ones of the graphicalillustrations illustrated in and discussed with reference to FIGS.39-49.

While the preferred embodiment of the invention has been illustrated anddescribed, as noted above, many changes can be made without departingfrom the spirit and scope of the invention. Accordingly, the scope ofthe invention is not limited by the disclosure of the preferredembodiment. Instead, the invention should be determined entirely byreference to die claims that follow.

The embodiments of she invention in which an exclusive property orprivilege is claimed are defined as follows:

A computing system configured to gather social media content,comprising:

a memory;

a content collection and ingestion system, stored in the memory andconfigured, when executed on a computer processor, to communicate withone or more computing systems to direct a search of a content sourceusing a received collection request and to ingest the results of thedirected search into a data store; and

a content management system, stored in the memory and configured, whenexecuted on a computer processor, to display the ingested results on adisplay.

The embodiments of the invention in which so exclusive property orprivilege is claimed are defined as follows;
 1. A computing systemconfigured to gather social media content, comprising: a memory; acontent collection and ingestion system, stored in the memory andconfigured, when executed on a computer processor, to communicate withone or more computing systems to direct a search of a content sourceusing a received collection request and to ingest the results of thedirected search into a data store; and a content management system,stored in the memory and configured, when executed on a computerprocessor, to display the ingested results on a display.